Skip to content

Conversation

@MatiasHiltunen
Copy link

@MatiasHiltunen MatiasHiltunen commented Jun 20, 2025

Primary purpose of this PR is to add an example to demonstrate additional way to utilize audio input stream while offering a tool to visualize the stream for many purposes. While the amount of code and features included in the spectogram example are quite complex, it may be usefull for someone like me who is just entering the world of lower level audio. Currently the code uses OS's default input device.

While developing this example I noticed that on Windows using wasapi host AGC or noise suppression comes quickly into play when visualizing realtime audio input, regardless that build_input_stream_raw_inner should supposedly give raw audio stream if OS or driver agrees on that. To get real raw input audio on Windows with wasapi, I added feature to request raw audio stream behind new environment variable (global with OnceLock) that can be used to enable the mentioned feature. With that in place I was able to disable AGC/noice suppression on Windows 11 and get truly raw stream which allowed to run the spectrogram indefinitely without disturbance from OS/driver level filters. I did not face similar challenges on MacOS where the audiostream was seemingly untouched or atleast did not affect it at runtime. On Windows possible usecases with this could be for example longer running audio recordings where the volume and quality should stay constant, or if one would like to handle those by themselves. Windows seems to start lowering the input sound volume after certain period of inactivity

Spectrogram example is briefly tested to work on real devices: Mac Mini M4 (15.5 Sequio), Linux and Windows 11.

The example is built with existing dependencies, only change to Cargo.toml so far is addition of libc for MacOS's dev-dependencies to allow creation of TUI app with minimal dependencies.

Check the comments along the code for additional information. This example has been reviewed by multiple runs on number of different LLMs such as Claude 4 Opus.

Br.
Matias

@MatiasHiltunen
Copy link
Author

Näyttökuva 2025-06-22 210145
Reference image of zoomed out terminal on windows while running this example.

@roderickvd
Copy link
Member

This is really cool and I can definitely see it landing somewhere in the Rust audio ecosystem. I'm wondering if Rodio would be the best place to get it landed?

Normally I'd think so but then this PR also adds the point of raw access on WASAPI. That's something Rodio does not have access to, and actually, cpal today neither has access to.

So offering additional WASAPI knobs is interesting, though I'm not a fan of the approach with the environment variable. What else can we think of that's more idiomatic - in the sense of a builder pattern, host/stream configuration options, or the like? I can imagine that other hosts also could have knobs that are worthwhile exposing, so I'd be interested to see what we could conjure up.

Then maybe split the PR into a spectrograph for Rodio and host options in cpal?

@wgibbs-rs
Copy link
Member

Forgive me if I'm misunderstanding, but is this implementing a way to create a spectrogram—built into cpal ?

I like the idea of a fast way to generate a spectrogram, and I would 100% use that in the near future, but I feel like that dilutes the goal of cpal, which I understood to be as low level as possible for directly creating audio I/O; cpal is what OpenGL is to SwiftUI.

If this is just for an example though, that, I believe, is a really good feature to introduce, and showcases a little bit more of what CPAL can do for non-input-to-output features.

@MatiasHiltunen
Copy link
Author

MatiasHiltunen commented Aug 2, 2025

@roderickvd Thanks for the ideas! I'll split the PR in near future and I'm thinking the same that env is quite clumsy way to try to force raw input with wasapi, would for example feature flag be much better option in this case? I'll check the rodeo option and might create pr there later!

@wgibbs-rs Spectrogram would be just an example, no integration to cpal :)

@roderickvd
Copy link
Member

@roderickvd Thanks for the ideas! I'll split the PR in near future and I'm thinking the same that env is quite clumsy way to try to force raw input with wasapi, would for example feature flag be much better option in this case? I'll check the rodeo option and might create pr there later!

Cool. Yes, a feature flag could also work.

Thinking out loud, is there any reason why a user would not want it? Or would this be going into too much of opinionated territory?

@MatiasHiltunen
Copy link
Author

@roderickvd Thanks for the ideas! I'll split the PR in near future and I'm thinking the same that env is quite clumsy way to try to force raw input with wasapi, would for example feature flag be much better option in this case? I'll check the rodeo option and might create pr there later!

Cool. Yes, a feature flag could also work.

Thinking out loud, is there any reason why a user would not want it? Or would this be going into too much of opinionated territory?

If you are referring to changes in wasapi's build_input_stream_raw_inner function it could possibly be what user actually expects of but I would not add this without a way to explicitly enable it just yet

@roderickvd
Copy link
Member

I share the feeling that we could make it opt-in for now and consider transitioning to making it the default later.

Just to rationalize it though, what would be pros/cons?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants